Using Sequence Homology to Filter High-throughput Protein-protein Interaction Data
نویسندگان
چکیده
Protein-protein interaction data obtained from high-throughput experiments is thought to have a large number of false positives i.e. interactions that are spurious and do not occur in the cell. This fraction is estimated to be as high as 50% in yeast [3]. Hence it is important to quantitate the reliability of these interactions and identify those that are true positives i.e. those that actually occur in the cell. In this study, we show that an interaction can be judged as true if the interacting proteins have homologs that interact in one or more species. We used protein-protein interaction data for E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens from the Database of Interacting Proteins (DIP) [2], February 2004 version. PSIBLAST was used to detect the sequence homologs. Then, we proved our hypothesis first in yeast by estimating likelihood ratios for high-throughput interactions with and without homologs based on the number of known true positives and false positives using Bayesian approaches. Based on these results, we estimate the number of true positives in the high-throughput data sets of S. cerevisiae, C. elegans and D. melanogaster. We have also created a Database of Homologous Interactions (http://www.pdbj.org/dhi) to display our results and allow access to the homologous interactions studied.
منابع مشابه
In Silico Analysis of Primary Sequence and Tertiary Structure of Lepidium Draba Peroxidase
Peroxidase enzymes are vastly applicable in industry and diagnosiss. Recently, we introduced a new kind of peroxidase gene from Lepidium draba (LDP). According to protein multiple sequence alignment results, LDP had 93% similarity and 88.96% identity with horseradish peroxidase C1A (HRP C1A). In the current study we employed in silico tools to determine, to which group of peroxidase enzymes LDP...
متن کاملMapping of TP53 protein network using cytoscape software
TP53 acts as a tumor suppressor in cancer. It induces cell cycle arrest or apoptosis in response to cellular stress and damage. p53 gene alteration could cause uncontrolled cell proliferation.In the present study, we used TP53 gene as the seed in the construction of a protein-protein functional association network to identify genes that might involve in tumorgenesis process with TP53. TP53 prot...
متن کاملProtein–Protein Interactions More Conserved within Species than across Species
Experimental high-throughput studies of protein-protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical...
متن کاملMachine Learning Approaches to Biological Sequence and Phenotype
Machine Learning Approaches to Biological Sequence and Phenotype Data Analysis Renqiang Min Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2010 To understand biology at a system level, I presented novel machine learning algorithms to reveal the underlying mechanisms of how genes and their products function in different biological levels in this thesis. Specif...
متن کاملAntifungal Activity of Heterologous Expressed Chitinase 42 (Chit42) from Trichoderma atroviride PTCC5220
The cDNA from the mycoparasitic fungus Trichoderma atroviride PTCC5220 encoding a 42 kDa chitinase (Chit42) was isolated. The nucleotide sequence of the cDNA fragment as having a 1263 bp open reading frame that encodes a 421 amino acid polypeptide, and a high homology was found withother reported Chit42 belonging to the Trichoderma sp. The 22 amino acid N-terminal sequence is a putative s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004